Nature Genetics
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Parent-of-origin effects (POEs) occur when the impact of a genetic variant depends on its parental origin. Traditionally linked to genomic imprinting, these effects are believed to have evolved from parental conflict over resource allocation to offspring, which results in opposing parental genetic influences. Despite their potential importance, POEs remain heavily understudied in complex traits, largely due to the lack of parental genomes. Here, we present a multi-step approach to infer the par...
Show abstract
Most genetic variants associated with complex traits and diseases occur in non-coding genomic regions and are hypothesized to regulate gene expression. To understand the genetics underlying gene expression variability, we characterize 14,324 ancestrally diverse RNA-sequencing samples from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and integrate whole genome sequencing data to perform cis and trans expression and splicing quantitative trait locus (cis-/trans-e/sQTL) analyses in...
Show abstract
Genome-wide association studies (GWAS) have identified thousands of disease-associated loci, yet their interpretation remains limited by the heterogeneity of underlying biological processes. We propose Joint Pleiotropic and Epigenomic Partitioning (J-PEP), a clustering framework that integrates pleiotropic SNP effects on auxiliary traits and tissue-specific epigenomic data to partition disease-associated loci into biologically distinct clusters. To benchmark J-PEP against existing methods, we in...
Show abstract
Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expressio...
Show abstract
Despite the great success of genome-wide association studies (GWAS) in identifying genetic loci significantly associated with diseases, the vast majority of causal variants underlying disease-associated loci have not been identified1-3. To create an atlas of causal variants, we performed and integrated fine-mapping across 148 complex traits in three large-scale biobanks (BioBank Japan4,5, FinnGen6, and UK Biobank7,8; total n = 811,261), resulting in 4,518 variant-trait pairs with high posterior ...
Show abstract
Understanding the genetic basis of gene expression can shed light on the regulatory mechanisms underlying complex traits and diseases. Single cell-resolved measures of RNA levels and single-cell expression quantitative trait loci (sc-eQTLs) have revealed genetic regulation that drives sub-tissue cell states and types across diverse human tissues. Here, we describe the first phase of TenK10K, the largest-to-date dataset of matched whole-genome sequencing (WGS) and single-cell RNA-sequencing (scRN...
Show abstract
Alternative splicing is a key mechanism by which genetic variation contributes to human phenotypic diversity and disease risk, yet its incorporation into large-scale genetic studies remains limited. Here we utilize blood RNA-seq from 4,732 European-ancestry individuals in the INTERVAL cohort to construct genetic scores for junction-based splicing phenotypes, 13,851 of which showed R2 > 0.01 in withheld individuals. These models were used to predict splicing in the UK Biobank White-British subset...
Show abstract
Genomic imprinting involves parent-of-origin effect (POE) of regulatory element activity, often measured through methylation of CpG (5-mC) dinucleotides. While a dozen clinical syndromes are linked to defective imprinting, the extent this epigenetic phenomenon is linked to phenotypic variation and disease susceptibility remains undetermined. We show long-read HiFi genome sequencing for single-molecular profiling of 5-mC, together with pedigree-based phasing in early developmental tissue, provide...
Show abstract
Leveraging data from multiple ancestries can greatly improve fine-mapping power due to differences in linkage disequilibrium and allele frequencies. We propose MultiSuSiE, an extension of the sum of single effects model (SuSiE) to multiple ancestries that allows causal effect sizes to vary across ancestries based on a multivariate normal prior informed by empirical data. We evaluated MultiSuSiE via simulations and analyses of 14 quantitative traits leveraging whole-genome sequencing data in 47k ...
Show abstract
Polygenic scores (PGS) predict complex traits and stratify disease risk but often fail to fully capture individual-level variation. "Misaligned" individuals, whose observed phenotypes deviate from their genetically expected values based on polygenic scores (PGS), provide a powerful model for identifying factors beyond common-variant effects, including additional genetic factors. Here, we apply misalignment classification and enrichment testing frameworks to seven continuous and three dichotomous...
Show abstract
Most genetic variants associated with complex diseases lie in non-coding regions, yet mechanistic insights have been limited by the lack of an empirical framework for characterizing the molecular consequences of regulatory variation. Single-cell profiling of molecular quantitative trait loci (QTL) can connect variants to gene regulation, but prior studies lacked the sample size to detect variants at disease-relevant genes and the simultaneous measurements across regulatory layers needed to trace...
Show abstract
Polygenic diseases challenge genetic risk prediction due to extreme dimensionality, low per-variant effect sizes, and non-additive interactions. Conventional marginal P-value-based methods potentially overlook subtle signals and complex dependencies, while inefficient random sampling in ensembles misses sparse signals. We introduce ELAG, an ensemble learning framework that advances feature bagging by reformulating variant selection as an approximate reinforcement learning problem. Leveraging pol...
Show abstract
Colocalisation analysis is extensively applied across diverse GWAS and molecular QTL datasets to identify candidate causal genes. We systematically characterised large-scale colocalisation results across eQTL studies varying in cellular granularity and sample size, with the goal of providing design and interpretation recommendations. We found 34-50% of GWAS hits colocalised, and were more likely to colocalise if they were located nearer genes and had a more common lead variant. We also found ove...
Show abstract
Mapping the pleiotropic effect of genetic variation on biological processes and complex phenotypes is fundamental to extracting translational insight from genome-wide association studies (GWAS). Here we present The Human Genotype-Phenotype Map (GPMap), a repository of colocalizing genetic associations across 15,997 complex traits and 2.7 million molecular measurements, leveraging common and rare variants and cis-and trans-acting effects across disaggregated tissue types and single cell datasets ...
Show abstract
Large biobanks, such as the UK Biobank (UKB), enable massive phenome by genome-wide association studies that elucidate genetic etiology of complex traits. However, individuals from diverse genetic ancestry groups are often excluded from association analyses due to concerns about population structure introducing false positive associations. Here, we generate mixed model associations and meta-analyses across genetic ancestry groups, inclusive of a larger fraction of the UKB than previous efforts, ...
Show abstract
Studying the genetics of measures of intelligence can help us understand the neurobiology of cognitive function and the aetiology of rare neurodevelopmental conditions. The largest previous genetic studies of measures of intelligence have used [~]270k individuals who completed the fluid intelligence (FI) test in UK Biobank. Here, we integrate additional FI measures in this cohort and leverage eighty-two correlated variables to impute FI values for unmeasured individuals, increasing the sample si...
Show abstract
Genome-wide association studies (GWAS) have implicated tens of thousands of genetic variants associated with complex traits and polygenic diseases. Colocalizing GWAS variants with variants that may regulate gene expression, via expression quantitative trait loci (eQTL) mapping, has successfully led to the identification of disease-critical genes and their cell types of action. Recent studies predominantly colocalize proximal cis-eQTLs, which are estimated to regulate [~]10% of variance in gene e...
Show abstract
Genetic variants can influence complex traits by regulating gene products such as gene expression, RNA splicing, and protein abundance. While molecular quantitative trait loci (molQTL) are widely leveraged to infer causal genes, their overall contribution to complex trait variation remains unclear. Here, we systematically evaluate the contributions of expression (eQTL), splicing (sQTL), and protein (pQTL) QTL in blood to SNP-based heritability and polygenic prediction across 27 complex traits. U...
Show abstract
Transcriptome-wide association studies (TWAS) link genes to disease risk by integrating gene expression with genome-wide association study (GWAS) data, where the use of bulk-tissue expression data typically provides gene-disease association interpretations at tissue levels. Recently, the increasing availability of single-cell gene expression profiles provides an opportunity to to dissect these associations at finer cellular granularity, allowing identification of cell-level effects that are not ...
Show abstract
The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC pr...